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Negative dependence and stochastic orderings 


Fraser Daly* 


Abstract We explore negative dependence and stochastic orderings, showing that if an 
integer-valned random variable W satishes a certain negative dependence assnmption, 
then W is smaller (in the convex sense) than a Poisson variable of eqnal mean. Snch W 
inclnde those which may be written as a snm of totally negatively dependent indicators. 
This is generalised to other stochastic orderings. Applications inclnde entropy bonnds, 
Poisson approximation and concentration. The proof nses thinning and size-biasing. We 
also show how these give a different Poisson approximation resnlt, which is applied to 
mixed Poisson distribntions. Analogous results for the binomial distribution are also 
presented. 

Key words and phrases: Thinning; size biasing; s-convex ordering; Poisson approxi¬ 
mation; entropy. 
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1 Introduction 

Throughout this work we let IF be a non-negative, integer-valued random variable with 
expectation A > 0. We focus our attention here on those W which satisfy a certain 
negative dependence assumption, which we explicitly state in (12.ip below as a stochastic 
ordering between W + 1 and the size-biased version of W. Random variables satisfying 
this stochastic ordering occur naturally in many applications. For example, if we may 
write IF as a sum of negatively related Bernoulli random variables, the assumption fl2.ip 
is satished. Examples of such sums appear in various urn models and occupancy prob¬ 
lems, for example. Several explicit examples of random variables satisfying our negative 
dependence assumption are discussed in Section O 

We are motivated by the work of Daly et ah j^, who explore links between Stein’s 
method for probability approximation and stochastic orderings. In their work, as here, 
these stochastic orderings often reflect the dependence structure of the underlying random 
variables. In particular, shows that the stochastic ordering assumption we make here 
implies a straightforward upper bound on the total variation distance between IF and a 
Poisson random variable. 

In this work (and in particular in Section [2] below), we explore further consequences 
of our stochastic ordering assumption. In particular, we will see that our negative de- 
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pendence assumption leads naturally to bounds on the entropy of W, concentration in¬ 
equalities for W and some further Poisson approximation results which complement and 
enhance those of |^. The bounds we derive on entropy generalise entropy maximisation 
results of Johnson [1^ and Yu 291]. See also 17j. Such results are useful, for example, in 


understanding probabilistic limit theorems in an information theoretic context. 

Our proofs will make use of the s-convex stochastic orders dehned by Lefevre and 
Utev [l^, which generalise the usual stochastic and convex orderings. We will also need 
a lemma of Johnson [l^ which links the operations of size-biasing and thinning. This is 
stated as Lemma 11.11 later in this section and will be a key tool in what follows. Further 
consequences of this lemma will be explored in Section |3l where we consider how these 
thinning and size-biasing results may be applied to Poisson approximation both with 
and without making any stochastic ordering assumptions. In particular, we will explore 
Poisson approximation for a mixed Poisson random variable using these techniques. 

The results and applications we consider in Sections [2] and [3] are closely related to 
the Poisson distribution. This is natural, since Lemma 11.11 is itself closely related to the 
Poisson distribution. We will also explore what can be said in relation to the binomial 
distribution. This is done in Section |U We seek the analogues of many of our other 
results in this case. For example, under a somewhat different assumption on the depen¬ 
dence structure of our random variable W to that used in Section |2l we hnd binomial 
approximation results and some further concentration inequalities and bounds on entropy. 

We use the remainder of this section to introduce the notation and ideas common to 
all the work that follows. We also state the lemma, due to Johnson Ig , which forms the 
key to many of the proofs that follow. 

For any a G [0,1], we dehne the thinning operator Tq by letting TaW = 
where 771 , 772 , • • • are iid Bernoulli random variables (independent of W) with mean a. 

Throughout this note, we will let ~ Po(yu) have a Poisson distribution with mean 
/i. The main object we will study in the work that follows is the operator 11^, given by 


t/„W = T„W + Z(i_„);,, 


( 1 . 1 ) 


where Z(^i_a)\ is independent of all else. In what follows, for notational convenience we 
will write Wa for a random variable equal in distribution to UaW for a G [0,1]. We note 
that Wi is equal in distribution to IF, and that Wq Po(A). 

It is easy to see that for any a G [0,1] we have E[hFa] = E[hF] = A. We also note that 
for any a,/? G [0,1], UpiUaW) is equal in distribution to UapW. Finally, it is useful to 
note that Ua acts trivially on Poisson distributions. That is, UaZ\ is equal in distribution 
to Zx for any A > 0 and a G [0,1]. Further properties of the operators Ua, and their link 


with the M/M/cx) queue, are discussed in 16 


In what follows, we will also need to employ size biasing. For any non-negative, 
integer-valued random variable W with mean A > 0, we let W* denote a random variable 
with the hF-size-biased distribution, with mass function given by 


P(1F* = j) = 


jnw=j) 

A 


( 1 . 2 ) 


for any j G E"*" = {0,1...}. Equivalently, we may dehne W* by letting 


nWg{W)] = AE[(7(1F*)] , 


(1,3) 
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for all functions g : i—)■ M for which the expectation exists. In a context similar to that 
considered here, size biasing appears throi^hout Stein’s method for Poisson approxima¬ 
tion: we refer the interested reader to [^ , (^ , and references therein. Note that the work 
we present here is completely distinct from Stein’s technique, however. 

We dehne the forward difference operator A and its inverse by writing A/(j) = f{j + 
1) - fU) and A-V(j) = / : Z+ ^ M. Letting A°/(j) = /(j), we may 

then dehne recursively A^f{j) = A(A”“^/(j)) and A“”/(j) = A“^(A“”+^/(j)) for any 
n > 1. 

We are now in a position to be able to state the following lemma, which appears as 
Corollary 4.2 of 


16 


Lemma 1.1. With Wa as above and j G Z"*", 

= j) = -A [p(w„ + 1 = j) - = j)] ■ 

oa a 

Lemma 11.11 relates the operations of thinning and size biasing, and will be used in 
establishing stochastic ordering and Poisson approximation results in Sections |2] and [3l 
A result analogous to Lemma 11.11 will also be needed for the results established in the 
binomial case and presented in Section 01 


2 Negative dependence and convex orderings 

In this section we consider the relationship between negative dependence and stochastic 
ordering. We will make use of the s-convex orderings, dehned by Lefevre and Utev 18 
for any integer s > 1. Letting X and Y be non-negative integer-valued random variables, 
we write X <s-cx Y if IE/(^) A for all / G where 

Xs = {f ■ ^ \ A*/(j) > 0 for all j G Z"*" and i = 1,..., s} . 

Note that the case s = 1 corresponds to the usual stochastic ordering (often denoted by 
X <st Y in what follows) and the case s = 2 is the increasing convex ordering, written 
A <icx Y■ For future use, we recall also the standard result that if EX = EV and 

A <icx Y then X <cx Y, where this denotes the usual convex ordering of such random 

variables. The interested reader is referred to 2^ for an introduction to the subject of 
stochastic orderiMs. 

Daly et ah |8| give bounds on the Poisson approximation of W in total variation 
distance under the assumption that 


VF* <.-c. W + 1 


( 2 , 1 ) 


for some s G N = {1,2...}, where W* is dehned by fll.21) . The main result of this section 
(Theorem 12.11) is that the ordering assumption fl2.ll) implies an ordering between W and 
a Poisson random variable of the same mean. This yields as an immediate corollary 
some bounds on Poisson approximation for W and a concentration inequality for W. 
From Theorem 12.11 we may also derive an upper bound on the entropy of W, and hence 


generalise results of [16(] and [29 
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Before proceeding further, we note that the stochastic ordering fl2.ip with s = 1 is 
closely related to well-known, often applied concepts of negative dependence. For example, 
if VF = Xi + ■ ■ ■ + Xn for some (dependent) Bernoulli random variables Xi,, Xn such 
that 

CovifiXi),g{W-X,))<0, (2.2) 

for each i and all increasing functions /, g : i—)■ M then W + 1 >st W*. See [l^ and 

[sl, where the property fl2.2p is referred to as total negative dependence. 

Recall that Bernoulli random variables Xi,..., Xn are said to be negatively related if 

E[0(Xi,...,X, i-l, ^i+li ■ ■ ■ 1 Xn)|X* = l] <E[0(Xi,...,X i-l, Xi^l, . . . , Xn)] , (2.3) 

for each i and all increasing functions 0 : {0,1}”“^ M. 

Papadatos and Papathanasiou [l^ showed that if Xi,... ,X„ are negatively related 
then fl2.2p holds, and hence the stochastic ordering fl2.ip holds with s = 1. There are thus 
many examples and applications which ht into this framework. We give some illustrative 
examples below. In each of these examples the random variable W may be written as a 
sum of negatively related Bernoulli variables, and therefore satishes W + 1 >st W*. The 
negative relation property may be established by a straightforward and natural coupling 
argument in each case. 

(i) If IT = Xi • XXn, where Xi,..., X„ are independent Bernoulli random variables 
then clearly fl2.3p holds. 

(ii) If W has a hypergeometric distribution then Barbour et ah [^, Section 6.1] show 
that W may be written as a sum of negatively related Bernoulli random variables. 

(hi) More generally, if we distribute m balls uniformly into n urns and let W count the 
number of urns which contain at least c balls, Papadatos and Papathanasiou [l^ 
Section 4] show that W may be written as a sum of negatively related Bernoulli 
random variables. 

(iv) Suppose we have an urn which initially contains balls of n different colours. We 
proceed by Polya sampling: on each of m draws we choose a ball uniformly from 
the urn, note its colour and return it to the urn along with an additional ball of 
the same colour. Let Xj be the indicator that no ball of colour i was seen during 
these m draws. Then Xi,... ,X„ are negatively related: see Section 6.3]. Here 
W = Xi + - ■ ■ + Xn counts the total number of colours not seen during the m draws. 

(v) Consider the following matrix occupancy problem. Suppose we have an r x n matrix 
and in row k we place Sk Is, their positions being chosen by uniform sampling 
without replacement. All remaining entries of the matrix are set to 0. Let Tj count 
the number of Is in column i and Xj = /(T, < m), the indicator that column i 
contains at most m nonzero entries. Then W = Xi + ■ ■ ■ + Xn counts the number of 
such columns. Barbour et ah show in Section 6.4 that Xi,..., X„ are negatively 
related. 
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(vi) Distribute n points uniformly on the circumference of a circle. Let Si,Sn be 
the arc-length distances between adjacent points and Xi = I {Si < a), the indicator 
that Si falls below some threshold a. Then j^, Section 7.1] shows that Xi,... ,Xn 
are negatively related. Their sum W counts the number of small spacings on our 
circle. 

(vii) Let (cTi,..., cr„) be a permutation of {1,..., n} drawn uniformly from the group 
of such permutations. Let Xi = I{ci < Qi) for some given oi,... ,a„, and W = 
Xi + ■ ■ ■ -[- Xn- In Section 4.1, shows that Xi, ..., X^ are negatively related. 


In each of these examples we may apply the results of this section. For further discussion 
of these examples, and many others, we refer the reader to i, Q i , and references 
therein. 

We now state the main result of this section. Theorem 12.11 In Theorem 12.21 we give 
a slightly stronger result for the case s = 1. The proofs of these theorems are deferred 
until Section 12.41 before which we consider some applications and corollaries. Note that 
throughout what follows we let (^) = 0 if 6 > a. 


Theorem 2.1. Let W be a non-negative, integer-valued random variable with 
A > 0 and 


E 




k = 3,..., s , 


E[1T] = 


for some s G N, where Z\ ~ Po{X). Let W* be defined by U.^) . IfW* <s-cx W + 1 then 
bF ^(s+i)—cx Z\. 


Theorem 2.2. Let W be a non-negative, integer-valued random variable with E,[W] = 
A > 0. Let W* be defined by mE)- If IF* <st IF + 1 then IF„ Wg for a > fi. In 
particular, IF <cx Z\, where Z\ Po{\). 


2.1 Applications to bounds on entropy 


We use this section to give some applications of our Theorem 12.21 t o upper bounds for 


entropy. The bounds we establish generalise results of [1^ and 


See also 17 . We 


dehne the entropy H{W) of a non-negative, integer-valued random variable IF in the 
usual way, although for convenience we take natural logarithms. 


CX) 

H{W) = - ^ P(IF = i) log(P(IF = i)). 

i=0 


For the random variables we consider here, results are stated which compare their entropy 
to that of a Poisson random variable with the same mean. Although no closed-form 
expression exists for H{Z\), there are several bounds on this quantity available in the 
literature. For example, there is the well-known bound 

H{Zfi} < ^ log (^27re ^ • 
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In the results that follow, we will also need the notion of log-concavity for a non¬ 
negative, integer-valued random variable. Recall that such a random variable W is log- 
concave if its support is an interval in and its mass function forms a log-concave 
sequence. That is, 

F{W = if > ¥{W = i - l)P(iy = i + 1), 

for all integers i > 1. 

Corollary 2.3. Let W be a non-negative, integer-valued random variable with E[1T] = 
A > 0. Let Zx ~ Po{X). IfW + 1 >st W* then 


H{W) < H{Zx). 


(2.4) 


Proof. Since W <cx Zx (by Theorem 12.21) and Zx is a log-concave random variable, the 


result follows from Lemma 1 of 29 


□ 


Corollary 12.31 shows that Zx maximises the entropy within our class of W with expec¬ 
tation A and such that IT -I- 1 >st W*. We again note that the conclusion of Corollary 
12.31 holds if W may be written as a sum of totally negatively dependent (or negatively 
related) Bernoulli random variables, as in our examples above. Such maximum entropy 
results are of importance in understanding probabilistic limit theorems in an information 
theoretic context. For further discussion of this, we refer the reader to and references 
therein. 

Corollary 12.31 generalises Theorem 2.5 of [l^, which states that fl2.4p holds under the 
assumption that W is ultra log-concave (of degree cxo), denoted ULC(cxo) in what follows. 
Recall that W is ULC(cxo) if 


(j + 1)!¥(1T = j + 1)2 > jl(j + 2)!P(1T = j)FiW = j + 2), j > 0 , 


or, equivalently, if 


(j + 1)P(1F = j + 1) 
P(1F = j) 


is increasing in j. We note that this is equivalent to hF -|- 1 >ir W*, where ‘> 
the likelihood ratio ordering. Since this is stronger than stochastic ordering [25 


A denotes 
Theorem 

l.C.l], our Corollary 12.31 strengthens Theorem 2.5 of 1^. Similarly, Corollary 12.41 below 
generalises Theorem 3 of 


See also 17 


Corollary 2.4. LetW be a non-negative, integer-valued random variable such thatF\W] = 
A > 0 and W 1 >st W*. Let Zx ~ Po{\) and Xi,X 2 ,..., be iid non-negative, integer¬ 
valued random variables. Let 


w 




2=1 


and Zx = W . 

2=1 


If Zx is log-concave, then H{W) < H{Zx). 


Proof. Combine our Theorem 12.21 with Theorem 1 of 29 


□ 
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For discussion of the log-concavity assumption used in this result (including some 


sufficient conditions for Z\ to be log-concave), we refer the reader to Section 3 of [2^ and 


Section 5 of 17j . In particular, |29|, Theorem 4] shows that if X.\ is log-concave and 

AP(Xi = if > 2P(Xi = 2), 

then Z\ is log-concave. 


Johnson 16 


goes further than establishing that the Poisson distribution maximises 
entropy within the class of ULC(oo) random variables of mean A. In his Theorem 5.1 
he shows that for such W the entropy of W^i is a decreasing and concave function of a. 
Using our stochastic ordering arguments we may also generalise this result, and show it 
applies to W satisfying VF -|- 1 >st IF*. This is done in Theorem 12.51 


Theorem 2.5. Let W be a non-negative, integer-valued random variable satisfying W + 
1 >st IF*, where IF* is defined by lil.ifil . Then 

and -^H{Wf<0, (2.5) 

with eguality if and only if W has a Poisson distribution. 

Proof. Our proof uses many of the same components of that of Theorem 5.1 of [l^, but 
replacing the arguments based on ultra log-concavity with stochastic ordering results. 
Following [l^ we decompose the entropy as 


H{Wf = A{Wa) - D{WjZx), 


where Zx Po(A), 


A(IU„) = -5^P(IF„=j)log(P(^A=j)) , 

D(W.,\\Z,) = , 

Note that D here is the relative entropy. Lemmas 5.2 and 5.5 of [l^ give us immediately 
that 

—D{W4Zx)>0 and ^Z1 (IF„||Za) > 0 , 

since for IF such that IF + 1 >st IF* we have Var(IF) < E[IF]. 

To prove (12.5p . it remains only to show that A(IFq) is a decreasing and concave function 
of a. By equation (15) of [l^ we have that 

^A(IF«) = - {Elog(IF:) - Elog(IF-„ + 1)} . (2.6) 

OOt Ot 

We will see in Section [T4l that IF such that IF +1 >st IF* satisfy the ordering IFa +1 >st 
IF* for each a G [0,1]. Since log(-) is an increasing function, it immediately follows from 
fl2.6p that A{Wa) is a decreasing function of a. 
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Similarly, from Lemma 5.3 of [ig we have that 


\ 2 

^ Wiw*) - Ef{w^ + 1 )}, 


where 


f{j) = ^^-r^log 


J 


- log 


J + 1 


A + V J 

Since /(■) is an increasing function, we see that A(Wa) is a concave function of a, com¬ 
pleting the proof of fl2.5p . 

The fact that equality holds in 


shown in the same way as the corresponding statement in Theorem 5.1 of [16 


if and only if W has a Poisson distribution is 

□ 


We have already discussed several examples in which the results of this section may 
be directly applied. We conclude with an example where we may use our results even 
without the negative dependence assumption fl2.ip : the lightbulb process. This model 
was introduced by Rao et al. j^, and is motivated by the pharmaceutical problem of a 
dermal patch designed to target n receptors. Each receptor is in one of two states. On 
each day r = 1 ,..., n, the patch causes r uniformly selected receptors to switch state. 

This process has also been studied, for example, by Goldstein and Zhang [l^, and 
Goldstein and Xia H- See also references therein. It is more often described in terms 
of lightbulbs being switched on and off, with r of the n lightbulbs chosen uniformly to 
have their state switched at day r, for r = 1,..., n. For concreteness, we assume that all 
n bulbs are switched off at the start of the process. The random variable of interest is 
W = W{n), the number of bulbs switched on after day n. We consider here the problem 
of bounding the entropy of W. 

Goldstein and Zhang [l^ show that (at least for n even) W* <st W + 2, but this is 
not enough to apply our results directly. Instead, we use the fact, shown by Goldstein 


and Xia |l^, that W is asymptotically distributed as a clubbed binomial distribution. 
If we let X ~ Bin(n — 1,1/2) have a binomial distribution, then we dehne the clubbed 
binomial random variable Y = Ym by writing 


_ / P(X=j-l)+P(X = j) 


nYm = j) = 


0 


m and j have the same parity, 
otherwise. 


That is, the clubbed binomial is formed by combining the mass of the binomial distribution 
at adjacent integers, so that it is supported on the lattice of non-negative integers with 
the same parity as m. We note that the support of these clubbed binomial distributions 
is appropriate to the problem at hand since, as shown by Rao et al. [2l| , if n = 0 (mod 4) 
or n = 3 (mod 4) then W is supported on the set of even integers at most n. Otherwise, 
the support of W is the set of odd integers at most n. It what follows, we always choose 
m in the dehnition of Y appropriately for the n under consideration. 

We begin with the straightforward observation that H{Y) < H{X). This follows 
immediately from the dehnition of Y. Since our binomial distribution X satishes X* <st 
X -|- 1, it follows from Gorollarv 12.31 that 


H{Y) < //(Z(„-i,/2), 


(2,7) 



















where Zx ~ Po(A) as usual. Hence, using fl2.7p . we have that 


H{W) < + \H{W) - H{Y)\ . 

This last term may be bounded using Theorem 17.3.3 of j^, which states that if W and 
Y are random variables supported on a subset of ZY of size k and 

Y,\nw=3)-ny=3)\<p<\. 

j&+ 


then 


\H{W) - H{Y)\ < -/31og 



We may apply this result here with the choice 


P = 5.47v^exp , 


( 2 . 8 ) 


by Theorem 3.1 of 1^, noting that jS < 1/2 for n > 10. Since both W and Y are 
supported on either the even or odd integers up to n, we may take k = (n/2) + 1. Hence 
we obtain the following. 


Corollary 2.6. Let W = W{n) be the number of bulbs switched on at the terminal time 
of the lightbulb process. Then, with n > 10 and f3 given by 12. 


H{W)<H{Z^n-i)/ 2 )-ld\og 


W 

77, + 2 


2.2 Applications to Poisson approximation 


For further applications of our Theorems 12.11 and 12.21 we turn to some Poisson approxi¬ 
mation results. For use here and in Section [21 we dehne the probability metrics we will 
use. In this framework, we are inspired by the recent work of Rollin and Ross 
1 < p < oo and / : M we let 


22 . For 


Ei/wi’ 


i/p 


d=o 


and we let ||/||oo = sup^ |/(j)|. For distribution functions F and G, we then dehne the 
distances 

dnAF,G) = \\A^F-A^G\\p, 

for 1 < p < oo and n G Z. Many commonly-used probability metrics ht into this 
framework. For example, 

• the total variation distance: dTv{F,G) = ld,,,{F,G). 

• the Kolmogorov distance: dK{F,G) = do,oo{F,G). 
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• the Wasserstein distance: dw{F,G) = do^i{F,G). 

• the stop-loss distance: dsL{F,G) = d_i^oo{F", G). 

Note also that di^oo is a metric useful in proving local limit theorems. 

To provide an illustration of the type of Poisson approximation result which may be 
obtained in our framework, in this section we will consider approximation in the metrics 
d-k,oo for k > —1. The results of Section |3] below will use some of the other probability 
metrics we have dehned. In the work of this section, we are motivated by the techniques 


and results of 12 


Corollary 2.7. Let W he as in Theorem, \2.1\ If W has distribution function F and Z\ 
has distribution function G\ then 


d_k,oo{F.Gx) 


Z\ -\- s + 1 
s + 1 


IT s -Fl 
5 “t“ 1 


for k = —1 ,..., s -1- 1. 

Proof. The result follows from Theorem 12.II and an argument analogous to that for Corol¬ 
lary 3.14 of [ij. □ 


If we take s = 1 in Corollary 12.71 we obtain that, for any IT with IT -|- 1 >st IT*, 

d-k,oo{F^ Gx) < (A - Var(IT)) , (2.9) 


for k G {—1, 0,1, 2} (and hence including bounds on the stop-loss, Kolmogorov and local 
limit distances). This applies in particular if IT may be written as a sum of totally 
negatively dependent Bernoulli random variables, as in the examples discussed previously. 
We conclude this section with a short example to illustrate this result. 


Example 2.8. Suppose we distribute m balls uniformly among > 1 urns, where each 
urn has the capacity for up to one ball. Let IT count the number of the hrst n urns 
that are occupied. Then IT has a hypergeometric distribution with mean A = mn/N and 
variance 

mn f N — n\ / m\ 

~N~ \N-1) \~n) ' 

As noted earlier, IT may be written as a sum of negatively related Bernoulli random 
variables and so satishes IT -|- 1 >st IT*. The bound (12.91) then gives 


d-,MF, a,) <2 — - ) . 


for k G { — 1,0,1,2}, where F is the distribution function of IT and Gx the distribution 
function of a Poisson random variable with mean A. 

We note that upper bounds of a better order may be available. For example, let A: = 0 
(so that we consider the Kolmogorov distance) and suppose that m and n are both of 
order 0{N). Then our upper bound is also of order 0{N), but an upper bound of better 
order 0(1) is available from Theorem 6.A of Barbour et ah [^. However, our results have 
the advantage of dealing simultaneously with a range of probability metrics. 
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2.3 Applications to concentration inequalities 

In this section we note that the convex ordering of Theorem 12.21 implies a concentration 
inequality for W. 

Corollary 2.9. Let W be a non-negative, integer-valued random variable such thafKW = 
A and W -\-1 >st hh*- Let t > 0. Then 

[ t\ 

P(iy>A + f) < eMl + -j 
F{W<\-t) < 

where the latter bound applies ift<\. 

Proof. To prove the first inequality, let 0 > 0 and note that (by a standard argument 
using Markov’s inequality) 

F{W - A > f) < exp {-e{t + A)} . 

Now, for 6 * > 0, the function is convex in x and hence we apply Theorem 12.21 to note 
that 

Ee®^ < exp {A {e^ - l)} • 

We then minimize the resulting bound over 6. The proof of the second inequality is 
similar. □ 

These inequalities have also been found in recent work by Arratia and Baxendale 
[2, Theorem 4.2], who show they perform well compared to other such concentration 
inequalities which are available. 


2.4 Proofs of Theorems 12.11 and 12.2 


We now give the proofs of Theorems 12.11 and 12.21 We begin with some properties of the 
s-convex orderings. Our Lemmas 12. 1 0f[TT2] will make use of results established by Denuit 


and Lefevre [l^ and Denuit et ah 11|. In particular, we will need closure of the s-convex 
orderings under operations such as convolution and taking mixtures. 


Lemma 2.10. Let X andY be non-negative, integer-valued random variables. If X <s-cx 
Y for some s G N then T^X <s-cx T^Y for all a G [0,1]. 


8.A.13 of 25 


Proof. To see this, use Property 4.6 of [ll| and a proof analogous to that of Theorem 


□ 


Lemma 2.11. Let W be a non-negative, integer-valued random variable with positive 
mean. If we have W* <s-cx W + 1 for some s G N then [TaW)* <s-cx ToW + 1 for all 
a G [0,1]. 
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Proof. Using; Lemma 12.101 and the closure of the s-convex orders under convolution 
Proposition 3.7] , we have that 

T^W + 1 T^{W* - 1) + 1 = T^{VW) + 1, 

where the operator V is dehned by VW = W* — l. Since the operators Tq, and V commute 
(as can be easily checked) we obtain 

TaW + 1 >,_e. V{TaW) + 1 = (T„1U)* , 


as required. 


□ 


Lemma 2.12. Let Xi and X 2 be independent non-negative, integer-valued random vari¬ 
ables with positive mean. If X^ "Ls-cx + 1 and X 2 <s-cx ^2 + 1 for some s G N then 

{Xi + X2)* <s-cx + ^2 + 1 - 

Proof. We hrstly note that (Xi + X 2 )* = Xi + X 2 — Xj + Xf, where the random index 
/ G {1, 2} is chosen independently of all else and such that 


P(J = 1) = 1 - P(J = 2) = 


EXi 


EWi + EX2 

See j^, Corollary 2.1], for example. Conditioning on the event that / = 1, we have 


(Xi + X2)* = W* + X2 + X2 + 1 


by assumption and using Proposition 3.7 of 10|. An analogous argument holds if we con¬ 
dition instead on the event that 1 = 2. To complete the proof we remove the conditioning 
using Proposition 3.7 of □ 


We are now in a position to give the proof of Theorem 12.11 Noting that Poisson 
random variables trivially satisfy the ordering fl2.ip for all s G N, Lemmas 12.111 and 12.121 
may be combined to give us that for W satisfying the assumptions of our theorem, 


Wf <s-cx fUa 


+ 1 ) 


( 2 . 10 ) 


for all a G [0,1]. 

Now, following Lefevre and Utev 18j, we let ho{X,j) = P(X = j) for any non-negative, 
integer-valued random variable X and j G E’*'. We dehne hk{X,j) for k > 1 hj letting 


hj,{X,j) = = J2hk-i{X,t) 


i=j 


By Proposition 2.5 of [18(], to prove that W <(s+i)_c 


E 


W 


< E 


k / ’ 


k = 


and that 


h,+i(W,j) < h,+i(Zx,j ), 


T-1 

1 

+ 1 

1 

(2.11) 

Zx, we need to show that 


l,...,s. 

(2.12) 

j > s + 1. 

(2.13) 
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Beginning with fl2.12p . the inequality with /c = 1 is trivial, since = A. In the case 
fc = 2, it is straightforward to show, using fll.Sp . that if Ehh* < KW + 1 (which holds 
by the assumption that W* <s-cx W + 1) then E(^) < E(^'^). The remaining cases, 
k = 3,..., s are covered explicitly in the statement of Theorem 12.11 
It remains only to establish fl2.13p . Lemma [1.1 1 gives us that 

■^ho{Wo^,j) = [ho{W^ + l,j) - ho{W:,j)] . 

Applying to each side of this equation (and interchanging summation and differ¬ 

entiation) we obtain 

-^h,+i(lT„, j) = - [h,(lT„ + 1, j) - hs{w:,j)] . 

oa (X 


By the stochastic ordering fl2.10p and Proposition 2.5 of [l8(], hs{Wa + 1, j) > hs{W*,j) 
for all a and j. Hence, letting j G E"*", 


0 < 


A 


-[K{W^ + 1 ,j)-K{W:, 3 )] da 


a 


a g 
, da 


hs+i(Wa,j) da 


= hs+i{Zx,j) - h,+i{W,j), 

as required, since Wi is equal in distribution to W and Wq ~ Po(A). This establishes our 
Theorem 12.11 

The proof of Theorem 12.21 is exactly as for Theorem 12.11 above (with s = 1), except 
for a change in the limits of integration. 


2.5 Remarks on some related results 

We conclude Section [2] by noting some results related to Theorem 12.21 Before stating 
these, we need a dehnition. We recall that random variables {Xi : i G P} are negatively 
associated if 

Cov(/(W,*gPi),^7(W,^gP2))<0, 

for all increasing functions / and g and all Pi,P 2 ^ P with Pi fl P 2 = 0. Negative 
association is closely related to other concepts of negative dependence we have used. 
For example, note that negatively associated indicator random variables are negatively 
related, and hence sums of negatively associated indicator variables satisfy our stochastic 
ordering assumption (12.ip with s = 1. 

Shao shows that if Ai,..., A„ are negatively associated and if the random variables 

are independent with each of the Xj having the same marginal distribution 

as Xi then 


Ai 


Xn <cx X\ 


xl 


(2.14) 


In the case where Xi,..., are indicator random variables, the stochastic comparison 
fl2.14p with the sum of independent random variables is a stronger result than our Theorem 
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12.21 in which the comparison is with a Poisson variable. We note, thongh, that onr results 
apply in a more general negative dependence setting, and that we obtain results for the 
more general s-convex orderings (as in our Theorem 12. ip . 

Results analogous to fl2.14p are also available in a positive dependence setting. Recall 
that random variables {Xj : i G P} are associated if 

Coy{f{X„zeT),g{X„zeT))>0, 

for all increasing functions / and g. Denuit et ah show that if Xj,..., X„ are associated 
then 

Xi + --- + x„>,,x/ + --- + xl. 

In the course of this work, we have been unable to hnd results in a positive dependence 
setting, such as for sums of associated random variables. 

3 Further results in Poisson approximation 

In Section we saw how our negative dependence assumption leads to bounds in Poisson 
approximation for our random variable W. These bounds were established using the 
convex ordering given by Theorem 12.11 which was itself proved using Lemma 11.11 We use 
this section to give another application of thinning and size biasing (via our Lemma 11 .11) 
to Poisson approximation. 

We state a bound in Lemma 13.11 below which will be applied (in Section 13.ip to give 
some general results in Poisson approximation which do not need any assumptions of 
stochastic ordering. We will note, however, the rehnements and simplihcations available 
in these results if we introduce the same stochastic ordering assumptions which we used 
in Section [2l 

In Section 13.21 we will apply Lemma 13.11 to the problem of Poisson approximation of 
the mixed Poisson distribution. 

Our results will be stated in terms of the distances dn,p dehned in Section 12.21 

Lemma 3.1. Let W he a non-negative, integer-valued random variable with distribution 
function F and E[iy] = A > 0. Let G\ he the distribution function of Zx ~ Po{X). Then 
for 1 < p < oo and n E Z 

/•i 1 

dnAF, Gx)<X -dn+i,piFjX\ Ff) da , 

Jo O' 

where Fa'^ is the distribution function of Wa + 1 and Ff is the distribution function of 
W*. 

Proof. Let Fa be the distribution function of Wa- We use the definition of dn,p and note 
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that F = Fi and Gx = Fq to obtain 


dn,p{F, G\) — 


< 


d 

A" / —F^da 

. n oa 


d 

A^—F, 

oa 


p 

da 


|^n+l^(l) _ ^n+lp-k 


a 


p 


da , 


where the inequality follows from Minkowski’s integral inequality 
the final line uses Lemma 11.11 


Appendix A] and 
□ 


3.1 Poisson approximation using thinning and size biasing 

The main result of this section is Theorem 13.21 below. This contains some Poisson ap¬ 
proximation results derived from Lemma 13.11 and also shows how these results may be 
combined with the same stochastic ordering assumption employed in Section [2l 

To ease the notational burden on this section we will write dn,p{X, Y) to mean dn,p{F, G) 
if X and Y are random variables with distribution functions F and G, respectively. 


Theorem 3.2. Let W be a non-negative, integer-valued random variable with E[hP] = 
A > 0 and let W* be defined by Let Zx ~ Po{X). 

(a) For s G 


d_s,i{W,Zx) < 


X 


l-\- s 
X 


di.s,i{W,W*-l), 


d.s,ooiW,Zx) < -di.s,oo{W,W*-l), 


where this last ineguality applies if s ^0. 
(b) If, in addition, W + 1 >s-cx hh* Ihen 


d_,^i{W,Zx) < 

d-k,oo{W,Zx) < 


-E 

1 -4“ 5 
2{s-k-l)+ 


k 


X 

■E 


hP+ s 


-w{ 


fP + s-l 




IT + s 

s 


W 


^ + s-l 


(3.1) 

(3.2) 


(3.3) 


(3.4) 


for k = 1 ,..., s -1- 1. 

Proof. We begin by using Corollary 2.1 of j^, and the fact that is equal in distribution 
to Za -|- 1 for all A, to note that 


IP* 


(T„W + Z(i_„)a)* = UT^wy + (1 - 4)(T,W + 1) + Z(1_„)A , 


where la is independent of all else and P(Jq = 1 ) = ck = 1 — P(/a = 0). 


(3.5) 
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Using the functions hs{X,j) defined by 02.111) for any non-negative random variable 
X and j G Z"*", we have that for s G Z"*", 

d.sAWa + 1, fU*) = ||h,+i(iu„ +1, •) - hs+i{w:, OIIp • 

With W* given by 03.51) . we can condition on la and Z(i_q)a to get that 

d.sAWa + 1, hU*) < ad.sATaW, V{TaW)) , 

where the operator V is such that VX = X* — 1 for any non-negative random variable 
X. Since the operators V and Tq, commute for any 0 < a < 1, we have that 


d.sA'^c. + l,W:)<ad.sATc.W,Ta{W*-l)). (3.6) 

Recalling that TaW = where 771 , 772 ,-•• are iid Bernoulli variables with mean 

a, in the case that p G {l,cxo} we may bound this latter distance using an argument 
analogous to that of Proposition 4.2 of Denuit and Van Bellegem [l^ to get that 

d_s,i{TaW,Ta{W*-l)) < a*+M_,,i(lU,lU*-l), (3.7) 

d.s,ociTaW,TaiW*-l)) < a^d.s,ociW,W*-l), (3.8) 


We may now complete the proof of the hrst part of the theorem. Combining Lemma 13.11 
with fl3.6p and fl3.7p we have that 

/•i \ 

d_,,i(W, Za) < Adi_,,i(lU, W* - 1 ) / da = -—d^.sAW, W* - 1 ). 

Jo 1 + s 

Similarly, using fl3.8p in place of fl3.7p . we have that if s 7 ^ 0 

d_,,oo(fU, Zx) < Adi_,,oo(lU, W* - 1 ) / da = -di_,,oo(lU, fu* - 1). 

Jo s 

This completes the proof of part ([ap. 

For part (jb]), we note that if hU -|- 1 >s-cx W* then 


di_,,i(W,lU* 


1) = E 


1U + s' 
s 



Combining this with fl3.ip and fll.3p gives us 

Now let k G {1,..., s -I- 1}. Corollary 3.14 of |l2| gives us that if IF -|- 1 ><j_ca; W* 
then 

dl-A:,oo(lF, IF* - 1) < ^ fw*-l+s 


W + s 
s 


We obtain fl3.4p when we combine this with fl3.2p and fll.3p . 


□ 


To illustrate this result, we consider two examples. 
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Example 3.3. Firstly, we return to the setting of Example 12.81 and let W have a hyper¬ 
geometric distribution (with notation as in Example I2.8p . Then, letting 


mn f {m + n)N — mn — N 

W V n{n - 1) 


and recalling that W + 1 >st 
e/2, d_ 2 ,ooiW,Zx) < e/2 and d 
distance. 


W* in this case. Theorem I3.2frb|) gives d-i^i{W,Zx) < 
-i,ooiWi Zx) < e, where this latter metric is the stop-loss 


Example 3.4. We consider now the Polya distribution, which has found applications in 
the study of epidemics, genetics and communications. Suppose we have an urn containing 
N balls, of which r are red and N — r are black. At each step, we draw a ball, note its 
colour and return it to the urn together with c > 1 additional balls of the same colour. 
Repeat this for a total of m draws, and let W count the number of red balls chosen in 
these m draws. Then W has a Polya distribution with mean A = mr/N and variance 
given by 

2 mr(A^-|-cm)(A^ — r) 

^ “ m{N + c) ■ 

We use Theorem I3.2f[ai) to give a bound on the Wasserstein distance dwiW^Zx) = 
<^ 0,1 (kP) Zx). From that result, we have that 

dwiW, Zx) < 2XdTv{W, W*-l)< 2A {drv(kF, W*) + drv(kF, W + 1)} , 

where the hnal bound is the triangle inequality. From inequalities (5) and (20) of [^, 
respectively, we have that dry^Wi kF*) < cr/2A and 

dTv{W, IT -I- 1) <- ^ I x/mr{N — r){N + cm) + m\/cr{N — r)| . 

2X{N — r)x/N + c f ' 

Combining these inequalities, we have the following explicit bound: 


dw{W,Zx) < 


Imr{N -|- cm) {N — r) 


+ 


m{N + c) 

1 

{N-r)x/W^ 


I x/mr{N — r){N -\- cm) -\- mx/cr{N — r)! . (3.9) 


Some further discussion of the Polya distribution, and the bound fl3.9p . is given in Example 
13.71 below. 


We note that the results of Theorem 13.21 are not the only way in which our stochastic 
ordering assumption can be used to get a Poisson approximation result based on Lemma 
13.11 For example, consider the Wasserstein distance dw = do,i and total variation distance 
drv = An argument analogous to that used to obtain fl3.6p gives us that 

dMWc. + 1, kF„*) < adrviTaW, T^{W* - 1)). 
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Combining this with Lemma 13.11 (in the case n = 0) we have that 


dwiW, Zx) < 2A / drviTaW, T^{W* - 1)) da . 

Jo 


If we assume that W + 1 >st W*, we may use Theorem 7 of [2^ to obtain 


dw{W, Zx) < 2A / aE[W + 1 - W*] da = X - Var(IT). 

Jo 

In this case, however, better bounds are available by combining Proposition 2 of 
Theorem 1.1 of Nl. We thus obtain 


with 


dwiCiW), C{Zx)) < ( 1 A (A - Var(iy)) . 


3.2 Poisson approximation for mixed Poisson distributions 

In this section we apply Lemma (3.II to the case where W ~ Po(.^) has a mixed Poisson 
distribution with positive mixture distribution ^ and E[,^] = A. We begin by showing that 
in this case, also has a mixed Poisson distribution. Note that we will not make any 
assumptions of stochastic or convex ordering in this section. 

Lemma 3.5. //IT ~ Po(0 ^hen Wa ~ Po{a^ + (1 ~ Q^)A) for all a G [0,1]. 

Proof. Elementary computations show that for j G ZP 

nTaW = ■^') = £ = *) = , 


so that TaW ~ Po(a^). Since is the convolution of TaW and an independent Poisson 
random variable, the result follows. □ 

Now, let us write /(q) = a/ + (1 — a)A and 


9a{J) = 


ex 




P{-^(a)}^(a) 


ij - 1)! 


Since 


p(iT„ +1=j) - nw:=3) = e 

Lemmas 13.11 and 13.51 give us that 


1 - ^ ) 9a{j) 


dn,p{F, Gx) < 

< 


a 

-E 

a 


1A"E [(^(„)-A)r?J|| da 


|/h-A|||A"^7, 


da , 
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where we have again used Minkowski’s integral inequality. 

For the remainder of this section we focus only on the case n > 0 and p = 1. In this 
case, straightforward calculations using Lemma 3.4 of give us that 


1^ fi'alll < 


-n/2 


and hence 


dn,i{F,Gx)< / E |^-A|K+(l-a)A)-"/' 


da. 


(3.10) 


Jo 

If we assume that the expectation in fl3.10l) exists for all a G [0,1] then we may interchange 
the order of integration to obtain the following result. 


Theorem 3.6. Let W ~ Po{^) for some positive random variable ^ with E[,^] 
F be the distribution function of W and Gx be the distribution function of Z\ 
Suppose that 


E 


|{-A|(a{ + (l-a)A)-”'" 


< CXO , 


rsj 


A. Let 
Po{X). 


for some n G ZF and all a G [0,1]. Then ifn^2, 


dn,i{F,Gx) < 


2 

n — 2 


E _ y(2-n)/2 


while if n = 2, d 2 ,i{F, Gx) < E |log (f) | ■ 

We illustrate this result by returning to the setting of Example 13.41 the Polya distri¬ 
bution. 


Example 3.7. Let W have a Polya distribution, as described in Example 13.41 We show 
how Theorem 13.61 may be used to give a bound on the Wasserstein distance dw{W, Zx) = 
Zx) between W and a Poisson distribution of the same mean. To do this, we follow 
[it] and construct W as the mixed binomial distribution Bin(m,^), where ^ has a beta 
distribution with density function 

g(t) = B(a, - t )^-^, 


for t G (0,1), where ■) is the beta function, a = rfc and {3 = {N — r)/c. 

Letting Y ~ Po(m^) have a mixed Poisson distribution, we may condition on ^ to 
obtain the bound dw{W,Y) < l.lSydriE from equation (1.8) of j3|. Our Theorem 

13.61 gives dw(X,Zx) < mE|^ — p\, where p = E^ = r/N. The triangle inequality and 
Holder’s inequality then give 


dw{W,Zx) < 
< 


1.15\/mE -|- mE|^ — p| 

-|- mx/Vai{^) 




r(r -|- c) 
N{N + c) 


3/4 


+ 


cr(N 


m{N + c) 


(3.11) 


Asymptotically, this bound behaves similarly to that derived in Example 13.41 above. For 
example, if m is of order 0{N) and c and r are both of order 0(1), then each of the 
bounds fl3.9p and fl3.1ip are of order 0(1). However, numerical studies suggest that in 
practice (13.111) performs better than (13.9p . 
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In the case of the total variation distance d-Tv = Theorem 13.61 gives the following. 


Corollary 3.8. Let F and G\ he as in Theorem \3.6[ For any e G [0,1/2] 


dTvi.Fi G\) < 




Proof. From Theorem 13.61 we have that dTviP, ^a) < \/A|. This expectation may 

be bounded using Lemma 1 of 23| and Holder’s inequality to give the required result. □ 


We note however, that bounds superior to that given by Corollary 13. 8l mav be available 
elsewhere. For example, consider the case where W has a negative binomial distribution. 
That is, assume that f has a gamma distribution with density function 


dii) = 


1 



for t > 0, for some {3 G (0, oo) and q G (0,1). 
/9g^(l — and 


Eie-A| 


2g/3^e-^ 

(1 - <i)m 


Note that A = 





q) ^ Var(0 = 


where this inequality uses a slight generalisation of Proposition A.2.9 of whose proof 
is straightforward. Thus, evaluating the bound of Corollary 13.81 and in particular with 
the choices e = 0 and e = 1/2, we obtain that in the negative binomial case 


drviF, G\) < 


1 - q 


mm 


I PI) 

TT \ IT J 


1/4' 


For comparison, Roos j23(| obtains the bound 


drviF, Gx) < fd 


1 - q 


mm 


3(l-<?) 

Aef3q 


.1 . 


(3.12) 


(3.13) 


and shows that it is superior to many others available in the literature. Note that, 
regardless of the value of /9, the bound (13.121) is of order Oiyjq) while (I3.13p has order at 
least as good as 0(g). 


4 The binomial case 

The results that we have stated in previous sections (based on Lemma II.ip are closely 
related to the Poisson distribution, since Lemma [FT] is itself closely related to the Poisson 
distribution. In this section we turn our attention to results in the binomial case. We 
consider results analogous to those in Sections [2] and [3l In doing this, we will use a Markov 
chain constructed by Yu and used in proving an upper bound on entropy. 

We begin with some useful dehnitions. Throughout this section let hF be a non¬ 
negative, integer-valued random variable supported on {0,1,..., n}, for some integer n > 
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0, and with mean A = nr > 0. We will let Z ~ Bin(n,r), a binomial random variable 
with the same support and mean as W. 

We recall that a random variable X supported on {0,1,..., n} is ultra log-concave of 
degree n, denoted ULC(n) in the sequel, if 


F{X = i + 1)2 ^ P(X = i) P(X = i + 2) 


(.:.)■ 


(”) 




for 0 < * < n — 2. We refer the reader to [20| for further discussion of this property. 
We note here that the ULC(n) property is intended to capture negative dependence, in a 
similar way to the ULC(cxo) property and the other negative dependence assumptions we 
have discussed in Section [2l 

For those W which are ULC(n) 


Yu 
1( 


28l. Theorem 1] proves that H{W) < H{Z). This 


is an analogue of Theorem 2.5 of [1^, which we generalised in our Corollary 12.31 The 
proof of Yu’s result employs a Markov chain {Xf : t G Z+l, whose construction we now 
outline. Further details and discussion are provided by (28| . 

We let Xq have the same distribution as W. The random variable Xt (for f > 1) is 
given by 

Xt = Hn{Xt_^+'nt-i), (4.1) 


where rjQ^rii,... are iid Bernoulli random variables with mean r and the operator Hn is 
such that for a non-negative, integer-valued random variable X supported on {0,1,..., n} 


F{Hr,X = z) = 


(n 


n 


-P(X = i) + 


1 ) 


n 


P(X = z + 1) , 


for 0 < z < n — 1. The operator Hn is referred to as hypergeometric thinning, since, 
conditional on X, HnX has a hypergeo metric distribution. This is the analogue of the 
(binomial) thinning operator dehned in Section [1] Recall that, conditional on X, T^X 
has a binomial distribution. 

In proving his entropy bound, Yu 28|] uses the random variables {Xt : t G Z"*"} in a 
role analogous to that of the random variables {Wa : 0 < a < 1} in the corresponding 
bound for the Poisson case [l^. Theorem 2.5]. We use the remainder of this section to 
examine how the techniques we have developed in our previous work may be carried over 
into this binomial setting. We be gin with the analogue of Lemma 11.11 

Writing pt{i) = P(Yt = z), Yu shows that for t > 0 


Pt+ii-i) = 


{n + 1- i){spt{i) + rpt{i - 1)) -F (z -F l){spt{i + 1) + rptji)) 

n + 1 


(4.2) 


where s = 1 — r. We note that Xt is supported on {0,1,..., n} and has expectation nr 
for each t G Z"*". The key property of this Markov chain is that as t —)■ oo, converges 
in distribution to the binomial distribution Bin(rz, r). 

Now, given a random variable W supported on {0,1,..., rz}, define the random variable 
W+ by 

P(W+ = j) = Ii±i^P(W + 1 = j), 

n{l — r) 

for 1 < z < rz. Straightforward manipulations of fl4.2p then allow us to see the following 
result, analogous to our Lemma [1.11 for the Poisson case. 
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Lemma 4.1. Let W be a random variable supported on {0,1,. .. ,n} with mean nr > 0. 
Then for t G and 0 < j < n 

F(X, = j) - P(A',+1 = j) = [p(.y,+ = j) - P(x; = j)]. 

n + 1 


4.1 Convex ordering and ULC(n) 

We use the next part of this section to explore stochastic ordering properties similar to 
those considered previously in Section |2l We will make use of ultra log-concavity, and will 
assume that W is ULC(n). For such W we have that hF+ >st W* and that Xt is ULC(n) 
for all f G Z. See (j^. Lemma 3]. Combining these facts we immediately see that if W is 
ULC(n) then Xf' >st Xf for all t G Z"*". We may then derive the following result, which 
plays the role of Theorem 12.21 in the binomial case. 

Theorem 4.2. Let W be ULC(n) with support {0,1,...,n} and mean nr > 0. Let 
Z ~ Bin{n,r) and X^ be given by Then Xf <cx X^ for all t < u. In particular, 

W <e. Z. 


Proof. We use the ideas and notation of the proof of Theorem 12.11 As in the proof of that 
result. Proposition 2.5 of [l8| gives us that we need only show that h 2 {Xt,j) < h 2 (W+i, j) 
for each t G Z+ and 0 < j < n. The hrst statement in the theorem follows easily from 
this, and the hnal statement by taking t = 0 and n —)■ oo in the hrst. 

As noted before, for W a ULC(n) random variable, we have that Xf' >st Xf for each 
t G Z+. Hence hi{Xf',j) > hi{Xf,j) for all f G Z+ and 0 < j < n. 

Now, by Lemma [4. II we have that 

yiri 1 — r] 

\ ’ a /!„(A+,i) - ft„(A7,i) , 

n -|- 1 

for each t G Z+ and 0 < j < n. Applying A~^ to this, we have that 
?7,r(l — r) 


n + 1 


[hix+,j) - hixf,j)] = h2(W+i,j) - hix,,j) > 0, 


as required. 


□ 


From Theorem 14.21 we may immediately recover the main result of Yu 
rem 1, which we state in Corollary 14.31 below. 


28 , his Theo- 


Corollary 4.3. Let W be ULC(n) with support (0,1,...,n} and mean nr > 0. Let 
Z Bin{n,r). Then 

H{W) < H{Z). 

Proof. Since W <cx Z (by Theorem 14.21) and Z is a log-concave random variable, this 
follows immediately from Lemma 1 of [2^. □ 

We also have the following, the analogue of Corollary 12.41 
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Corollary 4.4. Let W be ULC(n) with support {0,1,...,n} and mean nr > 0. Let 
Z ~ Bin{n,r) and Yi,Y 2 ,..., be iid non-negative, integer-valued random variables. Let 


w 




and 




2=1 


2 = 1 


If Z is log-concave, then H{W) < H{Z). 


Proof. Combine our Theorem 14.21 with Theorem 1 of 29 


□ 


Note that Corollary 14.41 generalises Theorem 2 of 2^, since a sum of n independent 
Bernoulli random variables is ULC(n). 

We conclude this subsection by observing that we may also obtain concentration in¬ 
equalities and binomial approximation results as corollaries of our Theorem 14.21 as in 
the Poisson case of Section [2l The proofs of these results are analogous to their Poisson 
counterparts in Section [2j 


Corollary 4.5. Let W be ULC(n) with support {0,1,... ,?7,} and mean X = nr > 0. Let 
t > 0. 


P(W>A + t) < 
¥{W<X-t) < 


' (1 -r)(A + t) ' 
(1 — r)A — rt 
' (1 -r)(A -t) ' 
(1 — r)A -I- rt 


— (t+A) 


1 — r -I- 


r(l — r)(A -I- t) 


t—X 


1 —r + 


(1 — r)A — rt 
r(l — r)(A — t) 

(1 — r)A -I- rt 


n 


where the last ineguality applies ift < A. 

Corollary 4.6. Let W be ULC(n) with support {0,1,...,u} and mean nr > 0. Let 
Z ~ Bin{n,r). Then ifW has distribution function F and Z has distribution function G, 

d.k,oo{F,G) < {nr{l - r) - Var{W)} , 


/or fee {-1,0,1,2}. 

4.2 Other results in the binomial case 

In Section in] we used our Lemma fl . 1 1 directly to provide a Poisson approximation result. 
Lemma 13.11 Similarly, we have the following. 

Proposition 4.7. Let W be a random variable supported on {0, 1,..., u} with mean 
nr > 0. Let Ft be the distribution function of Xt, for t G Z+. Then for 1 < p < oo and 
n G Z 

d„AF«, r.) < Y, F:) , 

Tl 1 

22=0 

where Ff[ is the distribution function of Xif and Ff is the distribution function of X*. 
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Proof. From the definition of dn,p we have that 


dn,p{Fo, Ft) — 


t-i 




U+lJ 


11=0 


< 


nr(l — r) 
n + 1 

nr(l — r) 
n + 1 


t-i 




u=0 


t-1 


||A”+^F+ - A^+^F, 


'★ 


ti=0 


where the second line uses Lemma 14.11 and the inequality uses Minkowski’s integral in¬ 
equality. □ 


It is worth noting, however, that we do not have a result analogous to Lemma 13.51 
here. That is, suppose that Xq = hF ~ Bin(n, .^) for some random variable f supported 
on [0,1], so that 


F{w = t) = rjE[e(i-o"-i. 


Then Xi does not, in general, have a mixed binomial distribution. In the Poisson case, the 
preservation of Poisson mixtures under the operators I/q, (0 < a < 1), as given by Lemma 
13.51 allowed us to easily and explicitly find a bound on the distance between a mixed 
Poisson random variable and a Poisson random variable with the same mean. However, 
no such property holds in the binomial case we are considering here. 
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